Abstract:The online videos are generated at an unprecedented speed in recent years. As a result, how to generate personalized recommendation from the large volume of videos becomes more and more challenging. In this paper, we propose to extract the non-textual contents from the videos themselves to enhance the personalized video recommendation. The change of the content types makes us study three issues in this paper. The first issue is what non-textual contents are helpful. Considering the users are attracted by the videos in different aspects, multiple audio and visual features are extracted, encoded and transformed to represent the video contents in the recommender system for the first time. The second issue is how to use the non-textual contents to generate accurate personalized recommendation. We reproduce the existing methods and find that they do not perform well with the non-textual contents due to the mismatch between the features and the learning methods. To address this problem, we propose a new method in this paper. Our experiments show that the proposed method is more accurate whether the video content features are non-textual or textual.
Abstract:The large number of user-generated videos uploaded on to the Internet everyday has led to many commercial video search engines, which mainly rely on text metadata for search. However, metadata is often lacking for user-generated videos, thus these videos are unsearchable by current search engines. Therefore, content-based video retrieval (CBVR) tackles this metadata-scarcity problem by directly analyzing the visual and audio streams of each video. CBVR encompasses multiple research topics, including low-level feature design, feature fusion, semantic detector training and video search/reranking. We present novel strategies in these topics to enhance CBVR in both accuracy and speed under different query inputs, including pure textual queries and query by video examples. Our proposed strategies have been incorporated into our submission for the TRECVID 2014 Multimedia Event Detection evaluation, where our system outperformed other submissions in both text queries and video example queries, thus demonstrating the effectiveness of our proposed approaches.